PCOS analysis

Agnes Lorenzen, Cecille Hobbs, Freja E. Klippmann, Julie Dalgaard Petersen & Mille Rask Sander

Introduction

Background

  • Polycystic ovary syndrome (PCOS) is a syndrome documented in women in their menstruating ages

  • Documented symptoms are often; period pains, irregular periods, ovary related problems and hormone imbalance

  • Patients with PCOS often have problems with pregnancy and potential complication with/in pregnancy

  • However, it is still not verified what the cause of PCOS is.

Aim

The aim of this study is to examine a data set (found on Kaggle) of patients with and without PCOS. The data set has been made in India and data comes from 10 different hospitals.

Data handling approach

  • Raw data:
    541 observations divided into 45 variables

  • 01_load_data:
    Simply loads the data

  • 02_clean_data:

    • Fixing random cells and replacing them with NA
    • Rename & factorizing columns
    • Split dataframe into body and blood measurements
    • Removed empty column
  • 03_augment:
    • Unit changes ( inch to cm)

    • Rounding & grouping BMI

    • Change Blood type and cycles from numeric values to characters

    • Create new column for cycle/ pregnancy stage

    • Merging data frame into one file

# Rounding of BMI and dividing into categories
body_measurements <- body_measurements |>
  mutate(BMI = round(BMI, 1)) |> 
  mutate(BMI_class = case_when(
    BMI < 18.5 ~ "Underweight",
    BMI <= 18.5 | BMI < 25 ~ "Normal weight",
    BMI <= 25 | BMI < 30 ~ "Overweight",
    BMI >= 30 ~ "Obesity")) |>
  mutate(BMI_class = factor(BMI_class,
                            levels =  c("Underweight", 
                                        "Normal weight",
                                        "Overweight", 
                                        "Obesity"))) |>
  relocate(BMI_class, .after = BMI)

Descriptive analysis of data

Dimensions:

PCOS <- read_tsv(file = "../data/PCOS_merged.tsv")

PCOS_dim <- PCOS |>
  dim() |>
  tibble()|>
  rename("PCOS dimensions" = "dim(PCOS)")|>
  print()

Count of how many have PCOS:

PCOS |> 
  count(PCOS_diagnosis) |>
  as.tibble() |>
  print()

![Age distribution](../results/04_age_hist.png)

Descriptive analysis of data

Information on the PCOS dataset:

Dimensions:

PCOS <- read_tsv(file = "../data/PCOS_merged.tsv")

PCOS_dim <- PCOS |>
  dim() |>
  tibble()|>
  rename("PCOS dimensions" = "dim(PCOS)")|>
  print()

Count of how many have PCOS:

PCOS |> 
  count(PCOS_diagnosis) |>
  as.tibble() |>
  print()

as

Plot of the ages:

Age plot

lsksd

Body measurement data analysis

In this analysis, we have been looking at the correlation between PCOS diagnosed patients, and what factors they potentially have in common from the body measurements data.

![BMI plot](../results/05_PCOS_BMI_correlation_box.png)

Blood measurement data analysis

Next, we are looking at the blood data to examine what parameters that seems related to PCOS.

PCA of blood measurements

her

PCA of body measurements

her

Discussion

her

Conclusion

  • no significance